Better Arabic Parsing: Baselines, Evaluations, and Analysis
نویسندگان
چکیده
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other treebanks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2–5% F1.
منابع مشابه
A Computational Approach to Zero-pronouns in Spanish
In this paper, a computational approach for resolving zero-pronouns in Spanish texts is proposed. Our approach has been evaluated with partial parsing of the text and the results obtained show that these pronouns can be resolved using similar techniques that those used for pronominal anaphora. Compared to other well-known baselines on pronominal anaphora resolution, the results obtained with ou...
متن کاملMulti-Path Feedback Recurrent Neural Networks for Scene Parsing
In this paper, we consider the scene parsing problem and propose a novel MultiPath Feedback recurrent neural network (MPF-RNN) for parsing scene images. MPF-RNN can enhance the capability of RNNs in modeling long-range context information at multiple levels and better distinguish pixels that are easy to confuse. Different from feedforward CNNs and RNNs with only single feedback, MPFRNN propagat...
متن کاملPredicting Scene Parsing and Motion Dynamics in the Future
The ability of predicting the future is important for intelligent systems, e.g. autonomous vehicles and robots to plan early and make decisions accordingly. Future scene parsing and optical flow estimation are two key tasks that help agents better understand their environments as the former provides dense semantic information, i.e. what objects will be present and where they will appear, while ...
متن کاملTwo Languages are Better than One (for Syntactic Parsing)
We show that jointly parsing a bitext can substantially improve parse quality on both sides. In a maximum entropy bitext parsing model, we define a distribution over source trees, target trees, and node-to-node alignments between them. Features include monolingual parse scores and various measures of syntactic divergence. Using the translated portion of the Chinese treebank, our model is traine...
متن کاملStatistical Parsing by Machine Learning from a Classical Arabic Treebank
Research into statistical parsing for English has enjoyed over a decade of successful results. However, adapting these models to other languages has met with difficulties. Previous comparative work has shown that Modern Arabic is one of the most difficult languages to parse due to rich morphology and free word order. Classical Arabic is the ancient form of Arabic, and is understudied in computa...
متن کامل